measurement
Measurement is the principled assignment of numbers to qualitities. And no matter what you measure or how you measure it, you’ll have some error.
Some tools tend to produce very little error (e.g., the length of this table in in inches). Other tools tend to produce more error. In the social sciences, our job is made harder by the fact that we often measure qualities that do not have an objective physical reality, like happiness or well-being. Measurement error is exacerbated when there is little available data.
Unfortunately, our statistic models will assume your measure has no error… unless you can tell the model how much error there is.
In measurement theory, we may assume that
\[ X_i = T_i + \epsilon_i \]
Meaning that for any observation \(i\), the observed score \(X\) on some measure is the sum of the true score \(T\) and error \(\epsilon\). Classical test theory assumes that \(\epsilon_i\) is randomly distributed, but other theories (IRT) disagree. Regardless, we can move forward with the assumption that observed scores are caused by some true score and some error.
mean sd 5.5% 94.5%
Location NaN NA NA NA
Loc NaN NA NA NA
Population 6.119600e+00 6.876156e+00 0.65780 1.897690e+01
MedianAgeMarriage 2.605400e+01 1.243630e+00 24.26950 2.826100e+01
Marriage 2.011400e+01 3.797905e+00 15.20850 2.649150e+01
Marriage.SE 1.399400e+00 7.969749e-01 0.54950 2.902200e+00
Divorce 9.688000e+00 1.820814e+00 6.66950 1.273050e+01
Divorce.SE 9.618000e-01 5.253675e-01 0.34085 1.893050e+00
WaffleHouses 3.234000e+01 6.578959e+01 0.00000 1.357450e+02
South 2.800000e-01 4.535574e-01 0.00000 1.000000e+00
Slaves1860 7.937834e+04 1.497309e+05 0.00000 4.355531e+05
Population1860 6.287293e+05 7.813127e+05 0.00000 1.903357e+06
PropSlaves1860 9.405132e-02 1.744486e-01 0.00000 4.561000e-01
histogram
Location
Loc
Population ▇▃▁▁▁▁▁▁
MedianAgeMarriage ▁▁▂▂▃▇▅▃▁▁▂▁▁▁
Marriage ▁▃▇▇▇▅▂▁▁▁
Marriage.SE ▁▇▅▃▁▂▁▁
Divorce ▂▃▅▅▇▂▃▁
Divorce.SE ▂▇▇▃▃▂▁▂▂▁▁▁
WaffleHouses ▇▁▁▁▁▁▁▁
South ▇▁▁▁▁▁▁▁▁▂
Slaves1860 ▇▁▁▁▁▁▁▁▁▁
Population1860 ▇▃▂▁▁▁▁▁
PropSlaves1860 ▇▁▁▁▁▁▁▁▁▁▁▁
dag_coords <-
tibble(name = c("A", "M", "D", "Dobs", "eD"),
x = c(1, 2, 2, 3, 4),
y = c(2, 3, 1, 1, 1))
dagify(M ~ A,
D ~ A + M,
Dobs ~ D + eD,
coords = dag_coords) %>%
tidy_dagitty() %>%
mutate(color = ifelse(name %in% c("D", "eD"), "a", "b")) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(aes(color = color),
size = 10, show.legend = F) +
geom_dag_text(parse = T, label = c("A", "D", expression(D[obs]),
"M",expression(italic(e)[D]))) +
geom_dag_edges() +
theme_void()p1 <- d %>%
ggplot(aes(x = MedianAgeMarriage,
y = Divorce,
ymin = Divorce - Divorce.SE,
ymax = Divorce + Divorce.SE)) +
geom_pointrange(shape = 20, alpha = 2/3, color="#1c5253") +
labs(x = "Median age marriage" ,
y = "Divorce rate")
p2 <-
d %>%
ggplot(aes(x = log(Population),
y = Divorce,
ymin = Divorce - Divorce.SE,
ymax = Divorce + Divorce.SE)) +
geom_pointrange(shape = 20, alpha = 2/3, color="#e07a5f") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("log population")
p1 | p2